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RECOMMENDED  MODIFICATIONS  TO  IMPROVE 
CVSD  SPEECH-ENCODING  PERFORMANCE 


INTRODUCTION 

Improving  the  continuously-variable-slope  delta  modulator  (CVSD)  speech  intelligi- 
bility and  quality  for  certain  relatively-low-data-rate  speech  encoding  applications  is  an 
important,  problem  in  the  Navy  and  other  DOD  branches  [1] . The  ultimate  objective  of 
this  study  is  a modification  or  technique  for  improving  CVSD  performance,  yet  re- 
quiring a low-to-moderate  increase  in  complexity  or  system  integration  chores.  This  report 
contains  a discussion  of  several  CVSD  modifications  warranting  theoretical  and  experi- 
mental evaluation  as  performance-improving  techniques.  These  candidate  techniques 
were  derived  from  a survey  of  the  applicable  research  literature  on  delta  modulation 
(AM)  and  other  waveform  encoding  techniques. 

The  report  begins  with  some  background  information  on  low-data-rate  speech 
encoding,  basic  delta  modulation,  and  the  CVSD.  Then  the  proposed  techniques  war- 
ranting evaluation  are  discussed,  with  the  relevant  research  references  and  their  impli- 
cations being  included  in  this  discussion.  Third,  comparisons  of  the  proposed  modifica- 
tions to  the  CVSD  with  the  current  CVSD  are  made  with  respect  to  implementation 
complexity,  decoder  output  SNR,  and  input  power  dynamic  range.  Finally,  the  conclu- 
sion contains  a summary  and  a recommendation  for  a preference  ordering  of  these  pro- 
posed techniques. 

BACKGROUND 

The  Desire  for  Low-Da ta-Rate  Speech  Encoding 

Secure  and  nonsecure  digital  voice  transmissions  are  increasingly  required  over  links 
such  as  HF  radio,  mobile  radio,  and  switched  analog  telephone  networks  which  permit 
transmission  rates  of  only  10  kb/s  or  less  [2,3] . In  keeping  within  current  data- 
transmission-rate  classifications,  digitally  encoded  speech  transmitted  at  rates  of  10  kb/s 
or  less  is  considered  low-data-rate  speech  [3,4] . 

The  Navy  desires  low-data-rate  speech  encoders  for  application  to  several  important 
voice  communication  systems  which  operate  in  a variety  of  transmission  environments. 
Other  DOD  agencies  and  civilian  communication  companies  share  some  of  the  same 
applications.  These  systems  and  environments  include: 

• Satellite  links  and  other  wideband  channels,  which  are  capable  of  multiplexing 
several  narrowband  channels  and  are  better  used  by  low-data-rate  encoders  [5,6]; 
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• Transmission  media  subjected  to  restrictive  spectrum  allocation  and  conservation 
specifications; 

• RF  transmission  links  having  fixed  and  limited  available  power,  which  can  achieve 
superior  SNR  at  the  receiver  by  using  low  data  rate; 

• Covert  communication  situations  in  which  the  principal  approach  is  to  transfer 
the  minimum  required  information  at  the  minimum  practical  data  rate  [7] ; and 

• Underwater  acoustic  communication  channels  having  characteristics  which  admit 
only  low  transmission  rates. 

Only  two  widely  known  techniques  will  provide  the  required  low  data  rates:  spectral 
deconvolution  techniques  (e.g.,  vocoders  and  other  analysis /synthesis  methods  [3,5 ,6,8, 
9]  and  AM  techniques  [3]. 


AM  Techniques 


The  basic  AM  encoder  (Fig.  1)  consists  of  an  adder,  a hard  limiter  (a  comparator 
with  binary  output),  a sampler,  and  an  integrator.  All  of  these  components  are  simply 
and  cheaply  implemen table.  The  AM  decoder  at  the  receiver  is  just  the  integrator  of  the 
AM  encoder  feedback  loop,  followed  by  a low-pass  filter. 


ANALOG 

IN 


CLOCK 


Fig.  1 — Basic  delta  modulator 


The  AM  can  be  viewed  as  a one-bit  differential-pulse-code-modulation  (DPCM) 
system  with  feedback  in  which  good  reconstruction  performance  at  one  bit  per  sample 
depends  on  a highly  correlated  input  signal  with  respect  to  the  sampling  rate.  For 
sampling  rates  above  32  kb/s  the  correlation  of  speech  samples  is  high  enough  for  the 
basic  AM  and/or  the  current  adaptive  AM  to  perform  with  acceptable  intelligibility  and 
quality.  Adaptive  AM  increases  the  accuracy  (generally  in  a mean-squared-error  sense) 
of  the  reconstructed  signal  by  varying  the  output  data  pulse  heights  (effectively  the 
encoder  step  size)  at  the  integrator  input.  This  variation  is  usually  based  on  the 
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input  signal’s  slope  or  average  level  [3-5,8,10-26].  This  added  degree  of  freedom  for 
adaptive  AM  yields  intelligible  speech  at  16  kb/s  even  through  noisy  channels. 


The  Current  Navy  AM:  CVSD 

The  CVSD  is  an  adaptive  AM  currently  under  consideration  for  use  as  a speech 
waveform  digitizer  by  the  Navy  and  several  other  DOD  agencies  [14] . In  the  CVSD  the 
slope-overload  level  of  the  encoder  is  varied  at  a syllabic  rate.  Consequently,  subjective 
performance  degrading  effects,  associated  with  instantaneous  companding  or  transmission 
errors,  are  further  reduced,  and  the  input  speech  dynamic  range  is  improved. 

The  slope  control  signal  for  adjusting  the  slope-overload  level  is  derived  directly 
from  the  three  most  recent  encoder  output  bits,  thereby  allowing  the  AM  output 
pulse  height  (step  size)  compression  and  expansion,  performed  at  the  encoder  and 
decoder,  to  track  very  well  without  the  necessity  of  additional  speech-signal-envelope 
information.  A slope-overload  condition  is  assumed  if  the  three  most  recent  output 
bits  are  alike.  This  occurs  when  the  input  speech  signal  is  increasing  or  decreasing  so 
fast  that  the  reconstructed  signals  cannot  keep  up  with  it.  In  this  situation  the  CVSD 
encoder  (the  step-size  algorithm)  temporarily  increases  the  step  size,  so  that  the  recon- 
structed signal  more  closely  tracks  the  original  signal.  During  nonoverload  conditions 
the  step-size  decays  exponentially  to  a level  which  minimizes  granular  noise.  The 
decay  time-constant  is  comparable  to  a syllabic  time  interval.  The  maximum  and 
minimum  step-size  levels  are  chosen  for  subjectively  good  all-round  performance. 


CVSD  MODIFICATIONS  WARRANTING  EVALUATION 
Optimizing  Parameters  to  Other  Performance  Measures 

The  standard  performance  measure  used  for  AM  designs  is  the  signal-to-quantization- 
noise  power  ratio  (SNR),  with  quantization  noise  including  the  slope  overload  distortion 
(“noise”)  in  addition  to  the  granular  noise.  As  usual,  the  popularity  of  SNR  in  this 
context  ic  due  to  its  measurability  and  mathematical  tractability.  However  the  response 
of  the  human  ear  is  not  matched  to  such  an  objective  performance  measure  as  the  SNR; 
consequently  many  researchers  have  sought  more  subjective  performance  measures,  often 
with  little  success.  Nevertheless  some  encouraging  results  have  been  reported  and  warrant 
further  evaluation  in  the  desired  context.  Proposed  in  particular  is  evaluation  of  perform- 
ance measures  incorporating  the  effects  of  past,  present  and  future  decisions  and  perform- 
ance measures  favoring  slope-overload  noise  over  granular  noise. 


Performance  Measures  Incorporating  the  Effects  of  Past, 

Present,  and  Future  Decisions 

Acknowledgment  of  past,  present,  and  future  effects  generated  by  each  coding 
decision  admits  the  influence  of  past  and/or  future  decisions  and  effects  on  the 
current  decision.  As  a result,  more  control  is  gained  over  the  accumulative  effects  of 
each  decision.  Zetterberg  and  Uddenfeldt  [4]  consider  a delayed  decision  approach  in 
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which  the  current  decision  on  the  step  size  and  the  AM  output  at  time  step  n is 
based  on  the  m delayed  input  samples  {xn,  ....  xn+m.j } . Two  performance  measures 
are  employed.  They  are  weighted  sums  of  the  mean-squared  reconstruction  errors 
incurred  over  those  m delayed  samples.  The  performance  measure,  weighting  more 
heavily  against  the  error  (quantization  noise  power)  within  the  speech  band,  gives 
better  SNR  performance,  since  decoder  output  filtering  is  then  more  effective. 

Song  [21]  and  Song  et  al.  [22]  use  approximate  minimum  mean-squared-error 
simulation  (at  the  decoder)  and  one-step-ahead  prediction  (in  the  encoder  feedback 
loop)  algorithms  to  reconstruct  a Gaussian-Markov  input  signal.  The  current  predictor 
value  is  conditioned  on  the  last  two  estimator  values  and  the  last  two  AM  output 
values.  Simulations  of  their  method  show  a SNR  quite  insensitive  to  the  input  signal 
power  level.  The  SNR  level  is  also  comparable  to  other  AM  configurations  which  are 
operating  at  their  optimum  signal  power  level. 

Jayant  [18]  instantaneously  adapts  the  step  size  on  the  basis  of  a comparison 
between  the  two  latest  AM  output  digits.  If  the  two  digits  are  the  same,  the  step  size  is 
changed  by  the  factor  P > 1;  otherwise  the  factor  is  -Q,  where  P Q = 1. 


Performance  Measures  Favoring  Slope-Overload 
Noise  Over  Granular  Noise 

Several  researchers  have  investigated  the  perception  of  slope-overload  noise  and  the 
subjective  preference  of  slope-overload  noise  over  granular  noise.  The  results  of  Jayant 
and  Rosenberg  [27]  indicate  that  speech  samples  exhibiting  the  minimum  degradation  on 
an  objective  quantization-noise-power  basis  are  not  subjectively  the  most  preferred  samples. 
They  also  show  that  a subjectively  prefered  AM  generates  greater  slope-overload  and  lesser 
granularity  than  the  objectively  optimum  AM.  As  a possible  explanation  Jayant  and 
Rosenberg  suggest  that  granularity  is  explicitly  perceivable  as  background  noise,  while 
slope-overload  “noise”  exists  only  in  relation  to  an  original  signal  which  is  not  known  to 
the  listener.  They  feel  that  this  preference  for  greater  slope-overload  and  lesser  granularity 
may  be  more  significant  at  data  rates  lower  than  those  considered. 

Levitt  et  al.  [28]  investigated  the  perception  of  speech  distortion  resulting  from 
slope  overload.  Their  research  demonstrates  the  inappropriateness  of  the  SNR  as  a 
viable  performance  measure,  since  there  can  be  a substantial  variation  of  the  SNR  at 
the  level  of  just-perceptible  distortion.  They  also  recommend  a more  appropriate  para- 
meter for  slope-overload  distortion  which  is  a function  of  the  truncated  portion  of  the 
input  signal’s  time  derivative  beyond  the  maximum  absolute  slope  level  of  the  AM. 

Greefkes  [17]  used  these  results  in  designing  a AM  system  for  which  intelligible 
speech  at  7.2  kb/s  is  claimed.  His  system  attempts  to  maintain  the  AM  more  often 
in  a slope-overload  condition  rather  than  allow  granular  noise  to  occur. 
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Applying  Nonbinary  and  Nonsynchronous 
Signaling  Techniques 

It  is  well  known  that  speech  signals  are  hightly  redundant  and  contain  a large  per- 
centage of  quiet  periods.  Eger  and  Campanella  [29]  claim  that  60%  of  conversational 
speech  is  quiet.  The  standard  binary  synchronous  output  waveform  is  not  particularly 
suited  to  coding  a speech  waveform.  For  example,  the  only  means  of  representing 
constant-speech-level  intervals  is  by  an  alternating  synchronous  waveform,  which  yields  a 
substantial  granular  noise  component.  In  addition  the  CVSD,  like  other  adaptive  AM 
systems,  suffers  rapid  performance  degradation  in  noisy-channel  environments  and  thereby 
causes  this  otherwise  attractive  speech  digitization  technique  to  be  unsatisfactory  for  some 
important  applications.  As  approaches  to  solving  the  problem  of  granular  noise,  tech- 
niques of  nonbinary  waveform  coding  and  amplitude  sampling  more  closely  matched  to 
the  input  waveform  characteristics  are  proposed;  and  as  approaches  to  solving  the  prob- 
lem of  noisy  channels,  coding  techniques  which  add  redundancy  without  substantially 
increasing  the  desired  low  signaling  rate  are  proposed. 


Waveform  Coding  and  Amplitude  Sampling  More  Closely 
Matched  to  the  Input  Waveform  Characteristics 

Speech  redundancy  can  be  further  exploited  by  replacing  the  binary  digit  code  with 
a temery  digit  code  or  by  applying  adaptive  sampling  techniques.  Inose  et  al.  [30]  theo- 
retically and  experimentally  demonstrate  that  a temery  digit  AM  output  code  with 
symbols  -1,  0,  and  1 used  the  symbols  -1  and  1 at  less  than  2/5  the  rate  of  a binary 
digit  code  (-1,  1)  for  a white-noise  signal  and  equivalent  output  SNR.  It  follows  that 
further  reduction  of  +1  symbol  rates  are  expected  when  speech  signals  are  considered, 
since  speech  consists  of  a high  percentage  of  fairly-constant-level  periods.  The  immediate 
result  is  a reduction  in  granular  noise  at  the  expense  of  an  additional  decision  level. 
Halijak  and  Tripp  [31]  suggest  a technique  which  may  give  similar  results;  they  recom- 
mend that  the  AM  comparator  have  a small  dead  zone  to  reduce  granular  noise. 

Adaptive  sampling  (asynchronous  sampling  or,  equivalently,  aperiodic  sample  times) 
of  the  speech  signal  is  another  technique  that  has  been  investigated  for  better  matching 
of  the  encoding  system  to  the  speech  waveform  characteristics  [30,32,33] . These  tech- 
niques are  based  on  taking  a sample  whenever  the  speech  signal  or  its  derivative  reaches  a 
certain  threshold  value.  The  temery  digit  system  of  Inose  et  al.  [30]  has  some  perform- 
ance improvement  whenever  the  ±1  symbols  from  the  (-1,  0,  1)  code  are  transmitted  at 
the  instant  of  some  amplitude  threshold  crossings.  Hawkes  and  Simonpieri  [32]  describe 
an  asynchronous  AM  system  with  memory.  Their  system  performs  below  that  of  the 
“ideal”  asynchronous  system  [30]  but  does  not  require  timing  information  transmitted 
to  the  receiver.  As  an  example,  if  the  last  two  AM  output  digits  have  the  same  sign, 
then  the  input  is  assumed  to  be  growing  in  absolute  value,  and  the  sampling  interval 
should  be  reduced.  Conversely,  if  the  last  two  output  digits  do  not  have  the  same  sign, 
then  the  input  is  assumed  to  be  relatively  constant,  and  the  sampling  interval  should  be 
increased.  Since  the  adaptation  algorithm  depends  on  only  the  AM  output,  the  decoder 
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does  not  need  additional  information  to  track  the  adaptive  sampling  times  of  the  encoder. 
Mostafa  and  El-Hagry  [33]  proposed  an  adaptive  sampling  technique  for  which  the 
sampling  times  depend  on  the  speech-signal  time  derivative.  The  effect  here  is  an  increase 
in  the  sampling  rate  during  that  portion  of  the  speech  waveform  having  the  larger,  higher 
frequency  components  or,  equivalently,  an  increase  in  the  sampling  rate  when  the  speech 
waveform  has  large  and  rapid  changes. 


Channel  Coding  Techniques  to  Improve  Noisy 
Transmission  Performance 

The  traditional  solution  to  the  problem  of  noisy  channels  is  to  apply  the  standard 
channel  coding  techniques  which  add  redundancy  digits  to  the  output  code.  However 
for  the  desired  applications  the  resulting  increase  in  signaling  rate  is  undesirable.  There- 
fore a technique  for  adding  redundancy  may  be  beneficial  if  it  does  not  substantially 
increase  the  signal  rate.  In  other  words  a desirable  technique  would  transform  the  output 
code  into  a code  with  the  same  signaling  rate  and  with  known  correlative  properties  that 
may  allow  error  detection  and  correction. 

Lender  [34-36],  Sekey  [37],  and  Wolf  [38]  have  investigated  some  of  these 
correlation-introducing  techniques.  Lender  and  Sekey  presented  a technique  known  as  the 
polybinary  technique,  which  transforms  a binary  digit  sequence  into  a n-ary  digit  sequence 
(n  > 2)  with  the  same  signaling  rate.  The  transformation  creates  correlation  within  the 
n-ary  digit  sequence  in  a specified  manner,  thus  allowing  error  detection.  The  case  of 
n = 3,  known  as  the  duobinary  technique,  appears  to  be  the  most  developed  [34-37], 

This  property  of  the  polybinary  technique  is  gained  however  at  the  expense  of  a decrease 
in  threshold-level  widths,  since  n - 1 threshold  levels  are  required.  Nevertheless  it  is 
uncertain  whether  the  duobinary  (polybinary)  technique  with  some  form  of  error  correc- 
tion (such  as  assigning  a likely  speech  waveform  state  of  value  based  on  context  when  an 
error  is  detected)  performs  better  than  the  standard  binary  digit  code  for  nonerrorfree 
transmission  requirements. 

Wolf  investigated  a more  general  transformation  of  binary  sequences  which  included 
the  polybinary  technique.  Some  form  of  error  correction  is  implied  when  he  discusses 
the  need  for  decoding  decisions  based  on  information-data  redundancy  and  on 
code-transformation  knowledge. 


Adaptive  Filtering  of  Reconstructed  Speech  Signals 

Eger  and  Campanella  |9|  have  proposed  adaptively  filtering  a speech  signal  to  make 
smaller  the  average  bandwidth  required  in  adaptive-channel-bandwidth  multiplexed  systems. 
They  exploit  the  redundancy  and  quiet  periods  of  speech,  which  are  characterized  by  an 
average  conversational  speech  signal  that  is  60%  quiet,  with  10.4%  of  the  signal  confined 
to  a 1 kHz  bandwidth,  13  ?%  to  a 2-kHz  bandwidth,  and  16.4%  to  a 4 kHz  bandwidth. 
Likewise,  adaptive  filtering  may  also  be  used  to  reduce  the  subjectively  objectionable 
granular  noise  in  the  AM  reconstructed  speech  signal.  In  this  case  an  adaptive-filter  band- 
width would  better  match  the  short-term  conditions  on  a subjective  and  an  objective 
basis. 
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If  the  filter  adaptation  algorithm  is  based  solely  on  the  AM  encoder  output  data 
stream,  as  is  the  step-size  algorithm,  then  additional  information  need  not  be  transmitted 
to  the  AM  decoder.  For  example,  the  adaptation  may  he  in  choosing  one  out  of  four 
filters  having  bandwidths  of  0 kHz,  1 kHz,  2 kHz,  and  4 kHz.  Goodman  1 15]  and 
Goodman  and  Greenstein  (39|  have  already  shown  aii-digitaJ  implementations  of  delta 
modulators.  Consequently  filter  adaptation  may  be  controlled  by  all-digital  methods. 


COMPARISON  OF  MODIFICATIONS  WITH  CVSD 
Implementation  Complexity 

Asynchronous  data  rates  and  adaptive  sampling  techniques  require  asynchronous 
transmission,  encryption,  decryption,  demodulation,  and  reconstruction  when  buffering 
and  data  time-labeling  methods  are  not  applied.  Consequently  the  proposed  techniques, 
if  acceptable  under  TRANSEC  (transmission  security)  considerations  because  of  possible 
intelligence  unveiling,  may  not  be  readily  acceptable  from  a systems  integration  view- 
point (buffering  and  its  consequences  may  be  preferred)  unless  their  performance 
improvement  justifies  the  interfacing  equipment  modification.  The  increase  in  complexity 
at  the  encoder  and  decoder  is  expected  to  be  small-to-moderate  for  asynchronous  encod- 
ing and  decoding  alone.  Most  of  this  increase  will  probably  be  at  the  encoder. 

The  increase  in  encoder  and  decoder  complexity  for  synchronous  temery  digit 
coding  is  expected  to  be  about  the  same  level  or  less  compared  with  the  asynchronous 
binary  digit  case.  However  the  systems  integration  task  may  be  acceptable  and  certainly 
easier  if  the  temery  symbols  are  mapped  into  quartenary  symbols,  thereby  allowing  more 
easily  implementable  four-level  synchronous  transmission,  encryption,  decrytion,  and 
demodulation.  Asynchronous  temery  digit  coding  may  require  twice  the  complexity 
increase  required  by  the  synchronous  method. 

Real-time  adaptive  algorithms  based  on  other  than  instantaneous  mean-squared- 
error  performance  measures,  possibly  incorporating  past  and/or  future  data,  require 
storage  and  computing  capability  at  the  encoder  and  decoder.  This  task  may  be  easily 
accomplished  by  a basic  microprocessor  or  by  some  simple  arrangement  of  standard 
logic  blocks. 

Adaptive-filter  implementation  at  the  decoder  would  probably  require  a low-to- 
moderate  increase  in  complexity.  The  adaptive  filter  may  be  implemented  using  analog 
or  digital  techniques,  although  some  combination  would  be  more  likely.  The  adaptive 
filter  may  be  a basic  analog  filter  with  digitally  alterable  parameters  or  components  and 
thereby  capable  of  realizing  N desired  filters.  The  adaptive-filter  control  signal  may  be 
derivable  from  the  received  AM  output  data  stream.  If  so,  the  adaptive-filter  control 
algorithm  may  be  adequately  realized  by  all-digital  techniques,  since  only  one  out  of  N 
filter  configurations  is  chosen  at  each  decision  stage. 
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SNR  Performance 

Although  it  has  been  stated  here  and  elsewhere  that  the  subjective  performance  of 
a AM  system  is  not  directly  related  to  the  decoder  output  SNR,  the  obvious  indirect 
relationship  (e.g.,  a large  SNR  yields  intelligible  and  good-quality  speech),  as  well  as 
other  attributes,  make  the  SNR  a universal  performance  measure.  Hence  some  SNR 
performance  comparisons  of  these  proposed  techniques  with  CVSD  are  in  order. 

The  SNR  should  be  improved  for  adaptive-sampling,  adaptive-filtering,  and  temery- 
digit-coding  techniques.  The  granular  noise  would  be  considerably  reduced  for  temery 
digit  coding  or  adaptive  filtering,  and  the  slope-overload  noise  would  be  reduced  for 
adaptive  sampling.  Better  SNR  would  also  be  expected  when  AM  parameter  optimization 
is  for  performance  measures  weighting  against  a reconstruction  error  function  incorporating 
past  or  future  data.  However  the  SNR  may  not  be  better  for  performance  measures 
weighting  against  none  or  only  part  of  the  reconstruction  error;  for  example,  weighting 
against  only  the  granular  noise  may  improve  the  subjective  performance  while  decreasing 
the  SNR  by  increasing  the  slope-overload  noise. 


Input  Dynamic  Range 

The  criterion  for  selecting  adaptive  sampling  times  and  for  selecting  the  no-change 
levels  (the  0 symbol)  in  temery  digit  coding  determines  the  low-level-input-signal  sensi- 
tivity of  AM  using  those  techniques.  However  both  techniques  give  a no-signal  indication 
(101010...  for  adaptive  sampling  and  000000...  for  temery  coding)  when  the  lowest  step- 
size  threshold  is  not  exceeded.  Consequently  the  low-level-input-signal  SNR  performance 
for  these  techniques  should  be  better  than  CVSD  performance  for  the  same  lowest  step- 
size  threshold  just  by  virtue  of  the  additional  degree  of  freedom. 

The  input  dynamic  range  performance  (SNR  vs  input  signal  level)  falls  off  at  nigh 
input-signal  levels,  depending  on  the  largest  step  size  and  the  adaptive  step-size  algorithm. 
For  equivalent  step  sizes  and  algorithms  the  adaptive  sampling  techniques  should  show  a 
more  gradual  fall  off,  since  another  degree  of  freedom  is  allowed  for  SNR  optimization 
(fewer  constraints  are  imposed).  Likewise,  the  temery  digit  coding  technique  should 
show  improved  performance  at  high  input-signal  levels,  since  granular  noise  is  reduced. 

The  adaptive-filtering  technique  should  show  improved  dynamic  range  performance 
over  CVSD,  since  reconstruction  “noise”  (error)  outside  the  current  adaptive-filter  band- 
width is  substantially  reduced.  As  a result  the  rverage  “noise”  power  is  reduced. 

Finally,  those  techniques  using  performance  measures  (incorporating  reconstruction  errors 
averaged  over  some  past  and/or  future  data)  should  also  have  better  input-dy namie-range 
performance  compared  with  the  CVSD.  This  follows  because,  for  equivalent  step  sizes, 
more  data  are  available  at  each  decision  point. 
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SUMMARY  AND  CONCLUSIONS 

In  this  report  several  techniques  or  modifications  were  proposed  for  improving 
CVSD  performance  at  low  data  rates.  Low  data  rates  were  defined  here  as  less  than 
10  kb/s.  A survey  of  available  research  literature  on  delta-modulation  and  speech-encoding 
techniques  was  the  basis  for  this  report.  The  proposed  modifications  to  the  CVSD  are 
characterized  as  follows: 

• Adaptive  (asynchronous)  sampling  of  the  input  speech  signal; 

• Adaptive  filtering  of  the  reconstructed  speech  signal; 

• Optimization  of  CVSD  parameters  and/or  decisions  to  performance  measures 
incorporating  past  and/or  future  data; 

• Optimization  of  CVSD  parameters  and/or  decisions  to  performance  measures 
favoring  slope-overload  “noise”  over  granular  noise;  and 

• Coding  of  encoder  data  with  nonbinary  codes  such  as  temery  or  correlative 
(e.g.,  polybinary)  codes. 

These  techniques  were  further  explained , and  the  justification  for  their  consideration 
was  discussed.  For  comparison,  some  assessment  of  their  impact  on  CVSD  complexity, 
input  dynamic  range,  and  SNR  was  also  made. 

In  terms  of  a likelihood  of  increased  complexity  and  additional  system  integration 
chores,  we  recommend  the  following  preference  ordering  as  a guide  for  further  evaluation 
of  the  proposed  techniques: 

la.  Adaptive  filtering  of  the  reconstructed  speech  signal  or 

lb.  Optimization  of  the  CVSD  parameters  and/or  decisions  with  respect  to 
performance  measures  favoring  slope-overload  “noise”  over  granular  noise; 

2a.  Coding  of  encoder  data  ^rith  r.onbinary  codes  such  as  temery  or  correlative 
codes  or 

2b.  Optimization  of  the  CVSD  parameters  and/or  decisions  with  respect  to 
performance  measures  incorporating  past  and/or  future  data,  and 

3.  Adaptive  (asynchronous)  sampling  of  the  input  speech  signal. 

Therefore,  as  a goal,  this  report  has  attempted  to  indicate  that  further  theoretical 
and  experimental  evaluations  of  these  techniques  for  improving  CVSD  performance  are 
warranted  based  on  probable  practical  acceptability  and  on  the  implications  of  pertinent 
published  research  results. 
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